SVM-based Automatic Annotation of Multiple Sequence Alignments
نویسنده
چکیده
Multiple Sequence alignments are a critical step in phylogeny inference. There is a lack of an appropriate approach which is capable of 1) finding the best global alignment and 2) automating and reproducing manual editing. Progressive alignment is an effective method for multiple Sequence alignments. However, its application in practice has also long been largely hampered because the alignment regions are not homologous to maximize the alignment score. The standard practice in phylogenetics involves manual editing of alignments and manual editing is a non-trivial task. Aiming at these problems, this study 1) uses SVM to capture the neighborhood of a site to automate and reproduce manual editing, and 2) builds the procedure of SVM Model Training and Automatic Annotation. Experimental results demonstrate that a SVM-based classifier can reproduce the manual editing tasks with an accuracy of 95.5%. This method is stable to both RBF parameters (Gamma and C) and clearly outperforms GBLOCKS and AL2CO, which are conventional editing/annotating methods. The classification accuracy achieved by the proposed method is always much higher than those achieved by the counterpart methods.
منابع مشابه
Reproducing the manual annotation of multiple sequence alignments using a SVM classifier
MOTIVATION Aligning protein sequences with the best possible accuracy requires sophisticated algorithms. Since the optimal alignment is not guaranteed to be the correct one, it is expected that even the best alignment will contain sites that do not respect the assumption of positional homology. Because formulating rules to identify these sites is difficult, it is common practice to manually rem...
متن کاملA CAD System Framework for the Automatic Diagnosis and Annotation of Histological and Bone Marrow Images
Due to ever increasing of medical images data in the world’s medical centers and recent developments in hardware and technology of medical imaging, necessity of medical data software analysis is needed. Equipping medical science with intelligent tools in diagnosis and treatment of illnesses has resulted in reduction of physicians’ errors and physical and financial damages. In this article we pr...
متن کاملAutomatic assessment of alignment quality
Multiple sequence alignments play a central role in the annotation of novel genomes. Given the biological and computational complexity of this task, the automatic generation of high-quality alignments remains challenging. Since multiple alignments are usually employed at the very start of data analysis pipelines, it is crucial to ensure high alignment quality. We describe a simple, yet elegant,...
متن کاملProblems and pitfalls of automatic gene annotation, gene collection, domain prediction, and sequence alignment
Because of the following problems within the automatic gene annotation process it is absolutely necessary to manually check and annotate all genes. Almost every myosin gene prediction and its translation produced by the automatic processes contains errors derived from including intronic sequence and leaving out exons, as well as wrong predictions of start and termination sites. It is also absol...
متن کاملHairpins in a Haystack: recognizing microRNA precursors in comparative genomics data
UNLABELLED Recently, genome-wide surveys for non-coding RNAs have provided evidence for tens of thousands of previously undescribed evolutionary conserved RNAs with distinctive secondary structures. The annotation of these putative ncRNAs, however, remains a difficult problem. Here we describe an SVM-based approach that, in conjunction with a non-stringent filter for consensus secondary structu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JCP
دوره 9 شماره
صفحات -
تاریخ انتشار 2014